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ABSTRACT 


We develop statistical models to identify the most influential entry-level attributes of a 
Marine recruit to predict two performance measures: the Computed Tier Score and the 
time to achieve the rank of Corporal (E-4) in the 0621 Field Radio Operator Military 
Occupational Specialty (MOS). We use data collected from 2007 through 2014, on more 
than 1,100 Marines in the 0621 MOS to construct multivariate linear regression models to 
estimate Marines’ Computed Tier Score and time to achieve E-4 based on their individual 
personal and professional attributes. 

We find statistically significant relationships to exist between the entry-level 
attributes of a Marine recruit and the performance measures. The most influential 
predictor variables include the run time on the USMC Initial Skills Test (1ST), number of 
crunches on the 1ST, rifle score, the Armed Services Vocational Aptitude Battery 
(ASVAB) General Technical (GT) score, ASVAB Clerical (CL) score, ASVAB General 
Science (GS) score, ASVAB Mathematics Knowledge (MK) score, ASVAB Paragraph 
Comprehension (PC) score, weight, and whether a Marine receives a weight waiver upon 
entrance into service. We recommend that new job performance measures be created for 
each high-density MOS in order to conduct further testing for MOS suitability. 
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EXECUTIVE SUMMARY 


Each year, the United States Marine Corps (USMC) accesses thousands of new recruits 
into a variety of career fields, an assignment process that has significant implications for 
the USMC and the individual Marine’s future career path. The USMC expends 
considerable manpower and time ensuring that annual Military Occupational Specialty 
(MOS) recmiting targets are met while trying to best match each recruit to those 
requirements. This research aims to provide the Marine Corps with an understanding of 
relationships between entry-level attributes of Marine recruits and two performance 
measures in order to better select the right recruits for each MOS. We develop statistical 
models to identify the most influential entry-level attributes of a Marine recruit in 
predicting two performance measures: the Computed Tier Score captured at the time of 
re-enlistment eligibility, and the time to achieve the rank of Corporal (E-4) in the 0621 
Field Radio Operator MOS in the USMC. 

Using data collected from 2007 through 2014 on more than 1,100 Marines in the 
0621 MOS, multivariate linear regression models are developed to predict a Marine’s 
Computed Tier Score and time to achieve E-4 based on their individual personal and 
professional entry-level attributes. These attributes, which include physical 
characteristics, test scores, physical fitness measures, education, and waiver information, 
comprise the independent variables in the study. This study answers the following 
questions: 

1. Do significant relationships exist between entry-level attributes of a 
USMC recruit and the USMC Computed Tier Score or the time for a 
Marine to achieve the pay grade of E-4? 

2. What are the most influential independent variables that predict the 
Computed Tier Score and the time to promotion to E-4 in a particular 
MOS field? 

3. What insight does this analysis provide in terms of recommending changes 
to the current entrance criteria for the 0621 Field Radio Operator MOS? 

4. What direction should a future study take to examine ways in which the 
matching of USMC recruits to MOS fields can be improved? 
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We find that statistically significant relationships do exist between the entry-level 
attributes of a Marine recruit and the USMC Computed Tier Score, as well as the time to 
achieve the pay grade of E-4 within the 0621 MOS in the USMC. Entry-level attributes 
of Marine recruits can be utilized to predict these dependent variables. Additionally, we 
recommend that this analysis be conducted on an annual basis, and not pooled into a 
multi-year study, at least into the near future. 

The most influential predictor variables that allow prediction of the Computed 
Tier Score are found to be the Initial Skills Test (1ST) run time, 1ST crunches, rifle score, 
the Armed Services Vocational Aptitude Battery (ASVAB) General Technical (GT) 
composite score, weight of a Marine, and whether a Marine received a weight waiver 
upon entrance into service. We find the most influential predictor variables for predicting 
the time to achieve the pay grade of E-4 to be 1ST crunches, 1ST run time, rifle score, the 
ASVAB General Science (GS) subscore, ASVAB Mathematics Knowledge (MK) 
subscore, ASVAB Paragraph Comprehension (PC) subscore, ASVAB 
Clerical/Administrative (CL) composite score, and whether a Marine receives a weight 
waiver upon service entrance. While 1ST crunches, 1ST run time, rifle score, and weight 
provide insight into the predicted time to achieve the pay grade of E-4, the variables GS, 
MK, PC, and CL_SCORE offer intriguing evidence that the USMC should further 
explore these variables for inclusion in the entrance criteria of a Field Radio Operator. 

In order to explore other suitability to MOS measures that could lend to predicting 
a successful match, we have determined that there is a need for the development of new 
suitability measures. It is the recommendation of this study that new job performance 
measures be created for each high-density MOS in order to conduct further testing for 
MOS suitability. With the development of new success or job performance measures, this 
study can be replicated using the new job performance measures as the dependent 
variable for analysis. 
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I. INTRODUCTION 

A. MOTIVATION AND OBJECTIVES 

Each year, the United States Marine Corps (USMC) accesses thousands of new 
recruits into a variety of career fields; an assignment process that has significant 
implications for the USMC and the individual Marine’s future career path. While the 
process takes into account both the recruit’s preferences as well as the needs of the 
Marine Corps, it is clear that there is scope for making the assignment process more 
efficient. More specifically, there is continued desire to ensure that recruits are best 
matched to the right Military Occupational Specialty (MOS). Matching recruits to the 
MOS that they will most likely succeed and have a high level of performance improves 
not only the quality of each MOS as a community, but the USMC as a whole. 

The USMC spends considerable manpower and time ensuring that annual MOS 
recruiting targets are met while trying to best match each recruit to those requirements. 
Currently, the USMC utilizes various entrance criteria to ensure that Marines are 
qualified to enter a specific MOS field. Headquarters Marine Corps (HQMC), D.C. 
Manpower and Reserve Affairs (M&RA) is investigating ways to improve the career 
field assignment process and seeks to explore the possible relationships between recruit 
attributes and potential indicators of success in the assigned MOS field. 

This research aims to provide the Marine Corps with a better understanding of 
relationships between recruit attributes and possible indicators of success in a particular 
MOS in order better select the right recruits for the right MOS. Through identification of 
key attributes that lead to success, the USMC can modify the current MOS assignment 
process in order utilize the right human capital while meeting the needs of the Marine 
Corps. More specifically, entrance criteria for specific MOSs can be changed or validated 
to ensure the Marines with the highest likelihood of success are placed in the appropriate 
MOS. This research could also be used to help the USMC decide how to allocate recruits 
to specialties to meet numerical targets in those specialties. 
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B. FOCUS OF THE RESEARCH 


The primary focus of this research is to develop a research concept, data 
collection plan, and repeatable methodology that improve M&RA’s understanding of 
relationships between recruit attributes and their success within the assigned MOS. The 
end-state goal is to determine the entry-level recruit attributes that lead to the most 
success in specific MOSs in order to validate or recommend change to the current 
entrance criteria for high-density or priority-fill MOSs. 

This study focuses specifically on the 0621 Field Radio Operator MOS, due to the 
stringency of the entrance requirements for this MOS, the technicality of the 
requirements necessary to perform successfully in the MOS, and existence of a 
significant yearly sample. During the course of this study, statistical models are 
constructed to estimate the relationships between entry-level attributes and two measures 
of perceived success within the 0621 MOS. The models are based on a set of variables or 
attributes that are available through a USMC Manpower database known as the Total 
Force Data Warehouse (TFDW). 

Our investigation is organized as follows: First, we conduct exploratory analysis 
of the data to identify data characteristics and relationships, such as missing or invalid 
observations. We do this in order to obtain a basic understating of the relationships 
between variables. Next, we use linear regression to construct models to predict two 
possible dependent variables; time to achieve the pay grade of E-4 and the USMC 
Computed Tier Score. The Computed Tier Score is a quantitative performance metric that 
provides commanders an assessment of an individual Marine’s performance for re¬ 
enlistment eligibility. Finally, we make recommendations for future study that will 
provide the most benefit to the career field assignment process. 

This study answers the following study questions: 

1. Do significant relationships exist between entry-level attributes of a 
USMC recruit and the USMC Computed Tier Score or the time for a 
Marine to achieve the pay grade of E-4? 
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2. What are the most influential independent variables that predict the 
Computed Tier Score and the time to promotion to E-4 in a particular 
MOS field? 

3. What insight does this analysis provide in terms of recommending changes 
to the current entrance criteria for the 0621 Field Radio Operator MOS? 

4. What direction should a future study take to examine ways in which the 
matching of USMC recruits to MOS fields can be improved? 

C. ORGANIZATION OF THIS THESIS 

This thesis is organized as follows. In Chapter II, we review literature on the 
career assignment process in the USMC, and we discuss the methodologies and findings 
of those studies. Additionally, Chapter II provides a detailed background into the current 
process for career assignment of enlisted Marines and an overview of the current MOS 
entrance criteria. Chapter III describes the data and methodology used to conduct this 
study. It includes a description of the data used for analysis and explanation of the data 
collection and cleansing process. Chapter IV discusses the results and analysis used in 
order to achieve those results. Chapter V provides conclusions of this study and 
recommendations for future work. 
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II. BACKGROUND 


A. THE PROCESS FOR CAREER ASSIGNMENT OF ENLISTED MARINES 

IN THE U.S. MARINE CORPS 

Each branch of the Armed Services uses specific entrance criteria for screening 
recruits and assigning them to an MOS. Headquarters Marine Corps (HQMC) Manpower 
and Reserve Affairs (M&RA) conducts detailed analyses in order to determine the 
manning requirements for each MOS and to meet the current needs of the Marine Corps. 
Based on these manning requirements, recruits are then assigned into the required 
occupational specialties to match the demand. Prerequisites for entrance into specific 
MOS fields are defined in the Marine Corps Order (MCO) 1200.17E, the Military 
Occupational Specialties Manual (Short Title: MOS Manual) (USMC, 2013). The 
prerequisites for each MOS were originally constructed in order to try and match the best 
recruit to occupational field, but are not necessariliy updated when the job specialties 
change. 

Traditionally, the Armed Services Vocational Aptitude Battery (ASVAB) 
composite test scores have been the most important deliniating factor in matching an 
individual to MOS. A recruit’s test scores, background information (citizenship, security 
clearance eligibility, etc.), preferences, and the needs of the Marine Corps are considered 
in the determination of MOS assignments. Marine recruits are assigned an Intended MOS 
(IMOS) approximately two weeks prior to graduating basic training. They are then 
forwarded to their assigned MOS school for initial MOS training. Upon graduating from 
MOS school, each Marine is offically assigned his or her Primary MOS (PMOS) 
designator. 

B. THE ARMED SERVICES VOCATIONAL APTITUDE BATTERY 

(ASVAB) AND MOS ENTRANCE CRITERIA 

This section describes the Armed Services Vocational Aptitude Battery 
(ASVAB), which is a series of examinations that the Armed Forces use to set 
requirements to enter into service in the U.S. military. For the U.S. Marine Corps, the 
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ASVAB also determines entrance requirements that must be met in order to be assigned 
to a particular MOS. 

1. ASVAB Components 

The U.S. military has been screening potential recruits for aptitude since World 
War I. In 1976, all military services began using the ASVAB for both screening potential 
recruits for service entrance and assigning them to military occupations. Combining the 
selection and classification testing into one exam made the testing process more efficient 
while also enabling the military services to better match recruits to MOSs. The ASVAB 
has been revised many times in order to improve inefficiencies and problems with 
misnorming (History of Miltary Testing, n.d). 

The ASVAB is comprised of ten subtests, each of which provides its own score. 
There are two versions of the ASVAB, a paper and pencil (P&P) version and a 
computerized adaptive test (CAT) version. The P&P-ASVAB combines two of the 
subtests, Auto Information (AI) and Shop Information (SI), into one single test, Auto and 
Shop Information (AR). These subtests are displayed in Table 1 (ASVAB Fact Sheet, 
n.d.). 

Possible recruits are screened for entrance into the military by calculating a 
composite score, called the Armed Forces Qualification Test (AFQT). The AFQT is a 
composite score that incorporates the following four ASVAB subtests: Paragraph 
Comprehension (PC), Word Knowledge (WK), Mathematics Knowledge (MK), and 
Arithmetic Reasoning (AR). The AFQT score is reported as a percentile between 1-99, 
which indicates the percentage of examinees that scored at or below the percentile score 
(ASVAB Scoring, n.d.). The current minimum AFQT score for entrance into the USMC 
is 32 for high school graduates and 50 for persons with a GED (ASVAB Scoring, n.d.). 
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Table 1. ASVAB subtests (after ASVAB Fact Sheet, n.d.) 


Test 

Description 

General Science (GS) 

Knowledge of physical and biological 
sciences 

Arithmetic Reasoning (AR) 

Ability to solve arithmetic word problems 

Word Knowledge (WK) 

Ability to select the correct meaning of a 
word presented in context and to identify 
best synonym for a given word 

Paragraph Comprehension (PC) 

Ability to obtain information from written 
passages 

Mathematics Knowledge (MK) 

Knowledge of high school mathematics 
principles 

Electronic Information (El) 

Knowledge of high school mathematics 
principles 

Auto Information (AI) 

Knowledge of automobile technology 

Shop Information (SI) 

Knowledge of tools and shop terminology 
and practices 

Mechanical Comprehension (MC) 

Knowledge of mechanical and physical 
principles 

Assembling Objects (AO) 

Ability to determine how an object will 
look when its parts are put together 
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2 . 


MOS Entrance Criteria 


The Marine Corps uses four other composite scores derived from the ASVAB 
subtest scores for determining entrance or assignment for recruits into occupational 
specialties. The four USMC composite scores are General Technical (GT), Mechanical 
Maintenance (MM), Electronics (EL), and Clerical/Administrative (CL). Each composite 
score is formulated from a combination of various ASVAB subtest scores (USMC, 2009). 
The composite scores and their derivations are shown in Table 2. 


Table 2. U.S. Marine Corps ASVAB composite scores 
(after Classification Testing, 2009) 


Composite Scores 

Score Derivation 

General Technical (GT) 

WK + PC + AR + MC 

Mechanical Maintenance (MM) 

AR + El + MC + AS 

Electronics (EL) 

AR + MK + El + GS 

Clerical/ Administration (CL) 

WK + PC + MK 


The U.S. Marine Corps assigns recruits to a particular MOS based on specific 
entrance criteria or prerequisite requirements. These entrance criteria vary by MOS, and 
are set to best match recruits with the right skill sets, knowledge base, physical ability, 
and aptitude levels to a corresponding MOS. The job descriptions, prerequisite 
requirements, and MOS requirements for each MOS are outlined in MCO 1200.17E 
Military Occupational Specialties Manual (Short Title: MOS Manual) (USMC, 2013). 
Descriptions, prerequisites, and requirements for the 0621 MOS (Field Radio Operator) 
and the 0311 MOS (Rifleman) are outlined by the MOS Manual as follows: 
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MOS 0621, Field Radio Operator PMOS 


a. MOS Description: Field Radio Operators employ radios to send and receive 
messages. Typical duties include the set up and tuning of radio equipment 
including antennas and power sources; establishing contact with distant stations; 
processing and logging of messages; making changes to frequencies or 
cryptographic codes; and maintaining equipment at the first echelon. Skill 
progression training for Sergeant and Corporal is Radio Supervisors Course. 

b. Prerequisites 

(1) Must be a U.S. Citizen. 

(2) Must possess an EL score of 105 or higher. 

(3) Must possess a valid state driver's license. 

(4) Security requirement: Secret security clearance eligibility. 

c. Requirements. Complete the Field Radio Operator (FROC) Course (after 
USMC, 2013). 


MOS 0311, Rifleman PMOS 

a. MOS Description: The Riflemen employ the modern service rifle/carbine, the 
M203 grenade launcher and the squad automatic weapon (SAW). Riflemen are 
the primary scouts, assault troops, and close combat forces available to the Marine 
Corps Air Ground Task Force (MAGTF). They are the foundation of the Marine 
infantry organization, and as such are the nucleus of the fire team in the rifle 
squad, the scout team in the LAR squad, scout snipers in the infantry battalion, 
and reconnaissance or assault team in the reconnaissance units. Noncommissioned 
Officers are assigned as fire team leaders, scout team leaders, rifle squad leaders, 
or rifle platoon guides. 

b. Prerequisites. Must possess a GT score of 80 or higher. 

c. Requirements. Complete the Marine Rifleman Course at the School of Infantry 
(after USMC, 2013). 


These two MOS descriptions are provided to emphasize that each USMC MOS 
has different job descriptions, prerequisites, and requirements. For the purposes of this 
study, it is important to note the prerequisite requirements for entrance into a specific 
MOS. These prerequisites are the criteria that the USMC uses to classify a recruit into an 
MOS. 
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C. LITERATURE REVIEW 

This section reviews previously conducted studies on career assignment and 
related subjects that are of interest to the manpower community. More specifically, 
studies in the following areas are reviewed: military career assignment and the 
relationship between ASVAB testing and performance in an MOS. 

1. Previous Studies on Career Assignment 

Rautio (2011) examines standards used to screen recruits for assignment to the 
communications field in the USMC. He discusses the relationship between ASVAB 
composite scores and success measures at the communications occupational field 
schools. The data used for analysis covers 9,921 Marines from fiscal year 2006 through 
fiscal year 2009. The author develops multivariate probit regression models that include 
all four years of data encompassing multiple MOS fields. The probit models determine 
the effects of ASVAB composite scores and other measures of performance on success at 
the communications schools (Rautio, 2011). 

Rautio (2011) considers models that use the following predictor variables: 
Gender, Race, Ethnicity, Marital Status, Number of Dependents, Primary MOS, Fiscal 
Year, Armed Forces Qualification Test (AFQT) Score, ASVAB composite scores, 
Education Fevel, Proficiency Score, and Conduct Score. The dependent variable 
identifies whether a Marine successfully completed the initial communications MOS 
school. Rautio (2011) finds that the ASVAB Electronic composite score (EL Score) has a 
significantly positive effect on the probability of success at the communications schools. 
The author also cites other variables that have a positive effect on the probability of 
success such as marital status, ethnicity, and the ASVAB Clerical composite test score. 
He also finds that gender and education level are statistically significant contributors to 
the prediction of success. 
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2 . 


Studies on the Relationship between ASVAB Testing and 
Performance in an MOS 


The Center for Naval Analyses (CNA) conducted a multi-year study (Carey, 
1993) for the Marine Corps Job Performance Measurement (JPM) project in order to 
construct valid measures for job performance and to determine the relationship between 
the ASVAB and Marine job performance. The study was conducted due to concern by 
Congress that a significant number of unqualified and low aptitude personnel had entered 
into military service during the 1970s. This concern was supported by CNA studies that 
discovered a misnorming of the ASVAB that resulted in 360,000 recruits entering into 
service that would have been declared ineligible if the ASVAB test scores had been 
accurate. In 1981, Congress mandated that each service perform a Job Performance 
Measurement (JPM) project in order to relate ASVAB scores to on-the-job performance 
(Carey, 1993). This study develops new measures for performance and success in order 
to study the relationships to predictor variables. 

CNA executed two phases of the study between 1986 and 1990. The first phase, 
(1986 to 1987) focuses on job performance measurement for infantry MOSs. The second 
phase (1990) focuses on job performance measures for the mechanical maintenance field 
(Carey, 1993). Our discussion is limited to the infantry MOS phase of the study. 

The infantry MOS study maps the job duties of five infantry MOSs based on the 
Marine Corps Individual Training Standards (ITS), now included in the USMC Training 
and Readiness (T&R) Manual, for infantry occupations. The study proposes job 
performance measures, hands-on performance tests (HOPTs) and job knowledge tests 
(JKTs) that were developed to directly test job duties as outlined by the ITS. Carey 
(1993) finds that the HOPTs proposed by CNA were effective measures of job 
performance due to their strong agreement with actual job performance based on the 
requirement that an examinee perform job-related tasks under realistic but standardized 
conditions. The JKTs are designed to be a parallel test to the HOPTs and include written 
exam knowledge testing of items related to job performance. This study notes that 
standardized HOPTs are expensive and difficult to develop and implement. CNA 
concludes that while the HOPTs should serve as the benchmark for measuring job 
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performance, JKTs provide promising replacements for setting enlistment standards. 
Marine Corps Proficiency marks (PRO marks) are also considered in the study but found 
to provide less fidelity to actual job performance when compared to the HOPTs and JKTs 
(Carey, 1993). 

In an earlier CNA study, Mayberry (1990) investigates the relationship between 
these JPMs and ASVAB composite scores, with particular focus on the General 
Technical (GT) composite score. Mayberry focuses primarily on the GT score because 
the Marine Corps uses this score to determine eligibility for the infantry occupational 
field. 

More than 2,300 infantrymen from five infantry MOSs were tested over the 
course of two days. Examinees were administered both the JKTs and the HOPTs. The 
results from the performance testing are then modelled in order to determine if 
relationships exist between aptitude, as indicated by the ASVAB composite scores, and 
MOS performance. 

Mayberry (1990) finds a strong relationship between individual aptitude level and 
later performance of critical MOS tasks. This study provides a useful measure of MOS 
performance and determines that the ASVAB composite scores provide significant 
indicators of performance within an MOS. 

D. CHAPTER SUMMARY 

The U.S. Marine Corps MOS Assignment Process attempts to assign the most 
qualified recruits with the most potential for success to the right MOS. The USMC uses 
entrance criteria to assign those recruits to an MOS while meeting the needs of the 
Marine Corps. Based on the literature reviewed, ASVAB composite scores lend well in 
predicting success during MOS school and within the assigned MOS. Additionally, these 
studies suggest that the EL composite score is good predictor of performance in the 0621 
Field Radio Operator MOS. Our study focuses on predicting success or performance in a 
specific MOS while in the operating forces. 
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III. DATA AND METHODOLOGY 


A. THE DATA 

This section gives a detailed explanation of the data collection process, the 
original data gathered for study purposes, and the preparation of the data in order to 
conduct a useful analysis. 

1. Data Summary 

The data used in our research is obtained from the USMC’s Total Force Data 
Warehouse (TFDW). TFDW is a database of personnel records for Manpower & Reserve 
Affairs. TFDW contains historical information for active duty and reserve Marines in the 
USMC. For the purposes of this study, data is pulled from TFDW for all active duty 
enlisted Marines with the 0621 MOS designator that entered into active service during 
the Fiscal Years of 2008 through 2010, or from 1 October, 2007 through 30 September, 
2010 . 

The data includes personal and professional information including physical 
characteristics, physical fitness performance scores, education information, 
demographics, waivers received, ASVAB test scores, promotions, marksmanship scores, 
and legal information. The data provides a snapshot in time of the Marine’s career profile 
that is updated when there is a change to the information, while other data fields are 
populated each month. Lastly, there are data fields that are populated only once, such as 
information gathered upon entering service. Table 3 gives details on the initial sample 
obtained for this study. Duplicate observations were removed and determined to be 
present due to changes in enlistment dates, but do not affect other fields. 
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Table 3. Summary of Marines entering service in FY2008-FY2010 

for the 0621 MOS 


Fiscal Year 

Observations in 
Original Sample 

Sample with 
duplicates removed 

FY2008 

429 

377 

FY2009 

433 

384 

FY2010 

510 

466 


2. Data Formatting and Cleaning 

This section explains the procedures taken to prepare the data for analysis 
including an explanation of the observations that were removed from the analysis and the 
grouping of categorical variables. 

a. Observation Removal, Variable Substitution, and Censoring 

In order to properly build relevant analytical models, the historical information for 
each Marine’s record should contain complete information for each of the predictor 
variables. When missing or invalid information exists for a predictor variable, we remove 
those records from the analysis. 

When there are missing values for the dependent variables included in the study, 
we first determine if there is a valid reason and possible substitution value for the 
variable. If no valid substitute exists, the records with missing values are removed. 
Reasons for missing values include Marines separated from service prior to the 
conclusion of their enlistment and deployment waivers. These exclusions are shown in 
Table 4. 

The first dependent variable considered, the Computed Tier Score is calculated as 
a combination of seven sub-variables. The Computed Tier Score is discussed in greater 
detail later in this chapter. One of the sub-variables of the Computed Tier Score, martial 
arts belt level, contains missing values in the data for 2008 and 2009. We decided to 
replace these missing values with the belt level closest to the median belt level of all 
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records. We assume that each Marine has received, at minimum, the median belt level 
due to USMC requirements to achieve certain belt levels during career progression. 
Additionally, this assumption has minimal impact on the overall values of Computed Tier 
Score, but allows us to include more observations for analysis. The number of 
observations with a substitute value for martial arts belt level is included in Table 4. 
Fiscal Year 2010 did not contain any observations that required substitution for martial 
arts belt level. 

The second dependent variable included in the analysis is the time in days that it 
takes for a Marine to promote to the pay grade of E-4, or time2E4. After removal of 
records for Marines separated from active service, missing values still exist for Marines 
that are not promoted to E-4 prior to completion of their first four years of active service. 
Although these Marines never achieved the pay grade of E-4 during their period of 
observation in the study, a censored value is substituted for these records for time2E4 in 
order to retain these important observations for study. The censored time2E4 value is 
equal to one plus the maximum observed time for the cases that we considered, which is 
1570 days. Twenty records from the FY2010 data (approximately five percent of the total 
number of records) have time2E4 set to this censoring value. Figure 1 shows a histogram 
of time2E4 for the FY2010 data, in which the twenty censored values are apparent at the 
far right. These values would be a continuation of the right-hand tail if the values were 
not censored. The numbers of observations with censored values for time2E4 for each 
year of study are shown in Table 4. 
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Histogram of time2E4 for 2010 data 
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Figure 1. Histogram of the number of days to promotion to the pay grade 

of E-4 (time2E4) for FY2010 entries to the 0621 MOS 


Table 4. Summary of data formatting and cleaning 


Fiscal 

Year 

Total 

Observations 

Observations 
Removed due 
to separation 
from service 

Observations 
used in 
Analysis 

Observations 

with 

substitute 
value for 
martial arts 
belt level 

Observations 

with 

censored 

time2E4 

2008 

377 

26 

351 

23 

32 

2009 

384 

30 

354 

14 

25 

2010 

466 

45 

421 

0 

20 


With the removal of observations from the data set as described above, the 
remaining data set consists of 1,126 Marines across three fiscal years. 
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b. Grouping of Categorical Data 

The categorical variables considered for analysis were screened to ensure they 
contained a sufficient number of different categories for use as potential predictor 
variables. 


3. Assumptions and Limitations of the Data 

One of the purposes of this study is to analyze the process of career assignment in 
order provide valuable recommendations for future occupational classification. It is the 
intention of this study that the modeling techniques and recommendations be suitable for 
the current manpower selection and assignment process in the USMC. Therefore, we seek 
to develop modeling techniques and supporting methodology that can be applied to a 
broad range of MOSs, particularly the high-density MOSs, in order to gain a better 
understanding of the overall picture of career placement. This study focuses on the 0621 
Field Radio Operator MOS. We do not consider an optimization problem placing recruits 
into the various MOSs; instead, we focus on the entry attributes that may indicate a 
successful match with an occupation. 

The Marine Corps uses the Computed Tier Score for re-enlistment purposes. The 
Computed Tier Score is a quantitative measurement for re-enlistment eligibility. This 
study does not attempt to evaluate the validity of the Computed Tier Score as a measure 
of performance or re-enlistment suitability. 

B. VARIABLE DESCRIPTIONS 

This section provides descriptions of the variables considered for analysis. All 
variables that have potential for correlation with success, re-enlistment, and MOS 
suitability are included. Additionally, only those variables obtainable through TFDW are 
analyzed. 

1. Dependent Variables 

The dependent variables considered for analysis are the USMC Computed Tier 
Score and time (in days) to promote to the pay grade of E-4, or Corporal, in the USMC. 
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a. Computed Tier Score 

The USMC uses two measures for determining eligibility for re-enlistment, the 
Computed Tier Score and the Commander’s Tier Recommendation. The Computed Tier 
Score was originally introduced in May 2011 through MARADMIN 273/11. It was 
created in order to provide commanders a quantitative assessment of an individual 
Marine’s performance. The Computed Tier Score is calculated using the scores from a 
Marine’s physical fitness test (PFT), combat fitness test (CFT), proficiency and conduct 
markings, and the rifle range qualification score. Additionally, points are awarded for 
USMC martial arts belt level and for meritorious promotions to the current rank. The 
Computed Tier Score is then compared to all Marines within the same MOS that are 
eligible for re-enlistment during the same fiscal year. An example of a Marine Corps Tier 
Worksheet is shown in Figure 2. 


CPL 1. M. MARINE 

PMOS0621 

Event 

MOSAvq 

'*scfay4±x>i?) 

SNM's Scores 

PFT 

2*6 

274 

CFT 

262 

284 

Profloency 

430 

430 

Conduct 

430 

430 

Rifle 

203 

303 

MCMAP 

MMB - Tan Belt 

MMD - Green Belt 

Mentonou* Promotion 

N/A 

0 

1691 1751 

Legal History 

Type 

Date 

0 NJP(s) 

N/A 

N/A 


Tier Chart 

Tier 1(10%) 91%-100% 


9 

Tier II (30%) 61%-90% 


& 3^ 

Tier III (50%) ll%-60% 

X 

-V* -V* 

&&&&& 

Tier IV (10%) 1%-10% 


9 


Figure 2. USMC Tier Worksheet (after GySgt B. Lodge, USMC, Personal 

Communication, September 10, 2014). 
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The raw scores from the PFT, CFT, and Rifle Qualification are not weighted or 
altered in the calculation of the Computed Tier Score. The Proficiency and Conduct 
markings are multiplied by 100, and each Marine Corps Martial Arts Program (MCMAP) 
belt level is associated with a specific point value when added to the total score. Finally, 
Marines that have been meritoriously promoted to their current rank receive an additional 
100 point bonus, as long as they have no misconduct on their record within the previous 
six months of promotion (GySgt B. Lodge, USMC, Personal Communication, September 
10, 2014). These point values are then summed together for the final calculation of the 
Computed Tier Score. As seen in Figure 2, Marines are then evaluated against their re¬ 
enlistment cohort and placed into Tiers 1-4, based on their respective percentile. For the 
purposes of this study, we use the non-categorized Computed Tier Score as a quantitative 
variable for analysis. 

In order to calculate the Computed Tier Score for each Marine or observation in 
the study, we capture the data for each component and generate the score using the 
aforementioned algorithm. All data captured for the computation of the Computed Tier 
Scores are taken on July 1 of the fiscal year prior to a Marine’s end of active service 
(EAS). This data is chosen because it marks the first day that Marines can apply for re¬ 
enlistment, and mirrors the process that the USMC uses to offer re-enlistment. 

b. Time to Achieve E-4 

The second dependent variable we consider is time (in days) to achieve the pay 
grade of E-4, or Corporal, in the USMC. We choose this metric due to the high 
significance of achieving this rank in the USMC and the possible correlation to 
performance within a Marine’s specific MOS. This variable is referred to as time2E4 in 
the regression model outputs used in the analysis. 

2. Independent Variables 

Table 5 contains a list of all independent variables that were considered in this 

study. 


19 



Table 5. Description of independent variables used in analysis 


Variable Name 

Type 

Description 

AGE 

Numerical 

Age of Marine upon entering 
service 

GENDER 

Categorical 

Gender of Marine 

HEIGHT 

Numerical 

Height upon entering service 

WEIGHT 

Numerical 

Weight upon entering service 

IST_CRUNCHES 

Numerical 

Number of crunches for Initial 

Skills Test 

IST_RUN 

Numerical 

Run time (in seconds) for 1.5 Mile 
run for Initial Skills Test 

RIFLE_SCORE 

Numerical 

Initial Rifle Score during Basic 
Training 

W AI V_TRAFFIC 

Binary 

Received waiver for having a traffic 
related offense prior to service 

W AI V_MIN OR. N ONTR AFF 

Binary 

Received waiver for a minor-non 
traffic related offense prior to 
service 

WAIV_MISCOND 

Binary 

Received waiver for a misconduct 
offense prior to service 

WAIV_DRUGSUBST 

Binary 

Received waiver for Drug or 
Substance usage prior to service 

W AI V_WEIGHT 

Binary 

Received waiver for being over 
weight requirement prior to service 

WAIV_ICD9 

Binary 

Received waiver for Medical 

reasons 

W AI V_OTHER 

Binary 

Received waiver for other reasons 
not captured 

GS 

Numerical 

ASVAB GS subscore 

MK 

Numerical 

ASVAB MK subscore 

PC 

Numerical 

ASVAB PC subscore 

AR 

Numerical 

ASVAB AR subscore 

AS 

Numerical 

ASVAB AS subscore 

WK 

Numerical 

ASVAB WK subscore 

MC 

Numerical 

ASVAB MC subscore 

El 

Numerical 

ASVAB El subscore 

GT_SCORE 

Numerical 

ASVAB GT composite score 

MM_SCORE 

Numerical 

ASVAB MM composite score 

CL_SCORE 

Numerical 

ASVAB CL composite score 

EL_SCORE 

Numerical 

ASVAB EL composite score 
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C. METHODOLOGY 

This section explains the techniques used to conduct the statistical analyses, 
variable transformations, variable selection methods, and model validation techniques. 
The following concepts are basic to fitting linear models and the reader is referred to a 
reference such as (Faraway, 2005) for further statistical understanding. 

1. Multivariate Linear Regression 

In order to address our study questions, we use statistical models to determine if a 
significant relationship exists between the independent variables and the dependent 
(response) variable. The response variables in this study are continuous, and are analyzed 
separately against the independent variables. Multivariate linear regression models are 
used for explaining the relationship between a single dependent variable Y , commonly 
called the response, and multiple independent variables (predictors), X l ,...,X p (Faraway, 

2005, p. 6). 

In a linear regression model, the continuous response variable Y is modeled in 
terms of p independent variables X ={x l x 2 ,...,x p } . The general form for a multivariate 
linear regression model is: 

Y = A) + P\ X \ + Pl X 2 + — + fip X p +£ 

where /?={/? 0 ,$,...,/? /? } are unknown parameters, or coefficients, that are associated 

with the independent variables. J3 0 is the intercept term, and s is the prediction error, or 
random error term that has no relationship to X (Faraway, 2005, p. 11). 

2. Variable Transformation 

Transforming the dependent or independent variables can often improve the fit of 
a model and correct violations of model assumptions. It is important to explore the 
possibility of improving a model by transforming the variables included, particularly the 
dependent variable. While transforming the variables used in analysis may make the 
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results difficult to interpret upon initial inspection, it can provide a better model fit 
(Faraway, 2005). 

The Box-Cox transformation family is used in our study to determine an 
appropriate transformation of the response variable. The Box-Cox family transforms the 
independent variable y —> g A (y) where the transformation indexed by A is as follows 
(Faraway, 2005, pp. 110-111): 


g Ay) 



when A^0 


g A (y) = {log(/t), when A = 0 


The best values of A and the regression parameters are determined using maximum 
likelihood. 


3. Variable Selection 


In developing statistical models, it is important to consider variable selection in 
order to determine the best subset of independent variables to be included the model. 
Introducing too many independent variables (“overfitting”) reduces the overall predictive 
power of the model. In order to find the best set of independent variables for analysis and 
to reduce the possibility of overfitting, we use Best Subsets Regression (Faraway, 2005, 
pp. 127-128). Best Subsets Regression finds the best set of predictors for a given subset 
size, and then chooses the subset size to optimize a criterion such as adjusted/? 2 . 
Adjusted R 2 is defined as follows (Faraway, 2005, p. 127): 


R =1 


RSS/(n-p ) 
TSS/(n- 1) 


, where 


RSS = residual sum of squares = I(y-y) 1 
TSS = total sum of squares = I<y-y) J 


where, n is the number of observation in the data set, and p is the number of predictor 
variables in the initial model. 
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Cross validation can be used to select the subset size. This is done by randomly 
selecting a given percentage of the data (e.g. ten percent), fitting the model on the 
remaining set of data, and then calculating the sum of squares for using the model to 
predict the first set. This procedure can be repeated many times to obtain a better 
estimate of how well the model is able to predict new data. The number of predictor 
variables used in the model is selected to minimize the estimated mean squared 
prediction error. We use Best Subsets Regression with cross-validation, taking out a 
randomly selected subset of ten percent of the observations each time for use as a test set, 
repeating this procedure ten times. 

4. Regression with a Censored Outcome Variable 

We consider regression using the number of days for a Marine to be promoted to 
the pay grade of E-4 (time2E4) as an outcome variable. As we discussed in section 
A(2)(a) above, in a number of cases the Marine did not achieve this promotion in the 
observable time period. These cases are “right censored” with the maximum observable 
time used to represent these values. Their actual promotion times are greater than the 
censored values. A regression model with censored values in the outcome variable can 
be estimated taking censoring into account. We use the survreg function in the survival 
package in R to fit these models. Because diagnostic tools are much better developed for 
uncensored regression, we use uncensored regression first and then compare the results to 
those obtained using the survreg function. 

5. Model Validation 

It is important to validate a statistical model to ensure that the model provides 
meaningful results. This section explains the techniques used to validate the linear 
regression models. 

The validity of the regression model depends on adherence to several key 
assumptions. These model assumptions need to be validated using regression 
diagnostics. The model assumptions are listed as follows: 
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1. The errors are independent, exhibit constant variance, and are normally 
distributed. 

2. The structural part of the model is correct. 

3. Unusual observations are not overly influential in the model (Faraway, 
2005, p. 53). 

The regression diagnostics are conducted using a set of diagnostic plots that 
allows for examination of these model assumptions. 

6. Software Used for Analysis 

The R programming language is used (R Development Core Team, 2014) for the 
analyses performed in this study. 

D. CHAPTER SUMMARY 

This chapter provides a detailed explanation of the data and methodologies used 
in order to conduct this analysis. Data formatting, observation removal, and data cleaning 
procedures are utilized in order to prepare the data for viable statistical modeling. The 
independent and dependent variables for consideration are modeled using multivariate 
linear regression while considering necessary variable transformations. Finally, the 
variable selection methods and model validation techniques are outlined for use in 
directing this analysis. 
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IV. RESULTS AND ANALYSIS 


We present the results of fitting the statistical models that are described in 
Chapter III. Two response variables are considered separately: the Computed Tier Score 
calculated near the time that Marines are eligible for re-enlistment (about 2.5 years into 
the initial enlistment); and, the number of days required for an enlistee to make 
promotion to the pay grade of E-4. These response variables are taken as measures of 
success of an enlistee’s placement in the 0621 MOS. For both response variables, we use 
data on USMC first enlistments in the 0621 MOS for FY2010. This is the most recent 
data available to us, and we also have found it to be the most reliable. We also explore 
using a multi-year model that includes data on all entries from FY2008 to FY2010. 

A. COMPUTED TIER SCORE ANALYSIS 

In this section we present the results of fitting a regression model to predict the 
Computed Tier Score from a set of explanatory variables obtained at the initial point of 
enlistment, using data for FY2010 entries. The independent variables used in all 
regression analyses are described in Table 5. 

1. Initial Variable Relationship Exploration 

Figure 3 shows a series of plots that provide an initial look at the nature of the 
relationships between each independent variable and the Computed Tier Score. The red 
line in each plot is a regression trend line that describes the mean-relationship between 
the two variables. 
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Figure 3. Initial variable relationships to Computed Tier Score for the FY2010 data 

Note: Computed Tier Score is on the vertical axis of each plot, and each independent 
variable is on the horizontal axis. 


An initial observation of the relationships between Computed Tier Score and the 
independent variables suggests the presence of possible relationships between variables. 
For example, the upward trend of the red regression line in the IST_CRUNCHES plot 
indicates that as the number of crunches increases, the Computed Tier score increases. 
Similarly, as the IST_RUN time increases, the Computed Tier Score decreases. 
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2. Evaluation of the Regression Model 

We explore the possibility that the response variable, Computed Tier Score, may 
need to be transformed in order to better satisfy the assumptions of a linear regression 
model. To do this we use the Box-Cox transformation method described in Chapter III in 
order to determine if a transformation of the dependent variable would be appropriate. 

For this model, the Box-Cox method produces an estimated exponent of X - 5.6 which 
is extreme given that the numerical scale of Computed Tier Score is in the low thousands. 
This result suggests that the Box-Cox family of transformations cannot provide a useful 
resolution of the dependent varaible as discussed in Faraway (2005). We decide not to 
transform the dependent variable in this case, accepting that by not doing so the error 
terms may not be approximately normally distributed, which requires a greater exercise 
of care to guard against the effects of outliers and other influential observations. 

We begin with all 18 possible predictor variables listed in Table 5, excluding the 
ASVAB subscores. Variable selection using Best Subsets Regression with cross- 
validation is performed in order to find a near-optimal model based on the original set of 
independent variables as discussed in Chapter III. When conducting cross-validation, we 
find the optimal model to contain five predictor variables, including WEIGHT, 
IST_CRUNCHES, IST_RUN, RIFLE_SCORE, and WAIV_WEIGHT. We then find the 
best subset that maximizes adjusted R 2 in order to include variables that are highly 
regarded as entrance criterion into an MOS in the USMC. The best subset size contains 
eight predictor variables. The resulting model is summarized in Figure 4. 
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lm(formula = Tier ~ WEIGHT + 

IST_CRUNCHES + IST_RUN + RIFLE_SCORE + 

WAIV_WEIGHT 

+ GT_SCORE + 

MM_SCORE 

+ cl_score, data 

= MasterlO) 

Residuals: 






Min IQ 

Median 

3Q Max 



-455. 34 -33.64 

7. 38 47 

.42 245. 

74 



coefficients: 







Estimate Std. Error 

t value 

Pr(>|t|) 


(intercept) 

1769.18820 

96.98545 

18.242 

< 2e-16 

ft** 

WEIGHT 

-0.34523 

0.17437 

-1.980 

0.04839 

* 

IST_CRUNCHES 

0.62392 

0.23601 

2. 644 

0.00852 

** 

IST_RUN 

-0.20447 

0.05594 

-3.655 

0.00029 

*** 

RIFLE_SCORE 

0.46530 

0.23745 

1.960 

0.05072 


WAIV_WEIGHTTRUE 

-65.00997 

22.68478 

-2.866 

0.00437 

** 

GT_SCORE 

2.13337 

1.00373 

2.125 

0.03414 

* 

MM_SCORE 

-1.38329 

0.70565 

-1.960 

0.05063 

• 

Cl_SCORE 

-1.14032 

0.74 311 

-1.535 

0.12567 


signif. codes: 

0 0.001 0. 

01 •*’ 0. 

05 \ ’ 0. 

1 * ’ 1 

Residual standard error: 87. 

18 on 412 

degrees 

of freedom 

Multiple R-squared: 0.1273, 

Adjusted R-squared: 0.1103 

F-statistic: 7. 

509 on 8 and 

412 DF, 

3-value: 

2.273e-09 


Figure 4. Computed Tier Score model output 


In Figure 4, the “Estimate” column shows the regression coefficients for each 
corresponding predictor variable, while the “Pr(>ltl)” column gives the associated p- 
values for each estimate. A p-value of less than 0.05 suggests that the variable is 
statistically significant, and should be included in the model. 

The model from Figure 4 includes 421 records and eight predictor variables from 
the 2010 data set. Table 6 shows the descriptive statistics for the seven continuous 
variables included in the model. The descriptive statistics shown are mean, median, 
standard deviation, minimum value, and maximum value. The only binary variable 
included in the model is WAIV_WEIGHT, with 405 Marines (96.2 percent) not assigned 
a weight waiver and 16 Marines (3.8 percent) receiving a weight waiver before entering 
active service. 
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Table 6. Descriptive statistics for the quantitative variables used in Computed Tier Score analysis 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

WEIGHT 

161.90 

161 

25.62 

96 

224 

IST_CRUNCHES 

75.41 

73 

19.51 

44 

155 

IST_RUN 

690.90 

690 

83.66 

460 

892 

RIFLE_SCORE 

287.50 

290 

19.42 

250 

329 

GT_SCORE 

101.80 

99 

10.40 

80 

136 

MM_SCORE 

100.20 

98 

11.89 

69 

140 

CL_SCORE 

103.40 

101 

8.87 

87 

137 
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We explore the necessity for non-linear transformations of the independent 
variables using partial residual plots (Faraway, 2005). We use cubic basis splines with 
four interior knots in order to determine if a non-linear transformation of the predictor 
variables would improve the model. A convenient class of transformations to consider for 
this purpose is cubic splines with interior knots placed at the 10th, 30th, 50th, and 70th 
percentiles of a variable. When used with variable transformations, these plots along with 
95 percent confidence bands suggest the types of transformations that are plausible for 
the predictor variables. For example, if a straight line fits within the confidence bands, it 
is unlikely that a nonlinear transformation is needed to bring out the explanatory power of 
the variable in question. The resulting partial residual plots are shown in Figure 5. It is 
clear that straight lines can be fit within the confidence bands of each of these plots, 
which suggests that a simple linear model formulation should be adequate. We confirm 
this by conducting an F-test, with (42,370) degrees of freedom in order to compare the 
results from a model with variable transformation versus a model without transformation. 
The resulting F-statistic is 0.9534 with a p-value of 0.5574. This comparison indicates 
that the model with transformations is not significantly different than the model without 
transformation at the a - 0.05 test level. Therefore, we do not reject the null hypothesis 
and conclude that non-linear transformation of the predictor variables is not necessary. 
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1ST RUN 


RIFLE SCORE 




GT SCORE 


MM SCORE 




Figure 5. Partial residual plots of the predictor variables used in the 

analysis of Computed Tier Score 


Note: The red line is the cubic regression spline, and the blue lines are 95 percent 
confidence bands. If a straight line fits between the blue confidence bands, a good 
indication of a linear relationship exists. 
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The regression diagnostics are displayed in Figure 6 using a set of diagnostic 
plots that allow for examination of the model assumptions. 
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Figure 6. Computed Tier Score model diagnostics 


As shown in Figure 6, the Residuals vs. Fitted plot shows no obvious patterns of 
unequal spread about the x-axis, thus indicating that the residuals exhibit constant 
variance. The Normal Q-Q plot indicates a presence of heavier than normal tails, and 
exhibits possible signs of non-normality. The Residuals vs. Leverage plot shows no 
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indication of overly influential data points in the model. In other respects, the model 
diagnostics indicate that the model assumptions are not violated. 

3. Explanation of the Model Results 

From the model fit in Figure 4, we determine that the most significant predictor 
variables are IST_RUN, WAIV_WEIGHT, and IST_CRUNCHES. Additionally, 
WEIGHT and GT_SCORE narrowly meet the 0.05 p-value threshold for inclusion in the 
model. MM_SCORE and CL_SCORE exhibit interesting relationships to Computed Tier 
Score, indicating that with a higher score in either test, the predicted Computed Tier 
Score actually decreases. This model provides statistically significant predictability for 
measuring success in terms of Computed Tier Score. 

B. ANALYSIS OF THE TIME TO ACHIEVE E-4 USING ALL POSSIBLE 

PREDICTOR VARIABLES INCLUDING ASVAB SUBSCORES 

The second dependent variable we consider in this analysis is the time it takes in 
days for a Marine to achieve the pay grade of E-4, and is referred to as time2E4 in this 
study. The remaining models in this study focus exclusively on analyzing the entry-level 
attributes of a Marine recruit against this dependent variable. 

Upon initial observation and variable correlation exploration, we determine that 
the ASVAB subscores are highly correlated with the ASVAB composite scores. This 
observation makes sense, given that the composite scores are derived from the subscores. 
Therefore, we perform two separate linear regressions in order to accurately consider all 
of the possible predictors. The first model considers all possible predictor variables 
excluding the ASVAB composite scores. The second model, which we discuss in section 

C, considers all possible predictor variables while excluding the ASVAB subscores. 

1. Initial Variable Relationship Exploration 

Figure 7 shows a series of plots that provide an initial look at the nature of the 
relationships between each independent variable and the time2E4. The red line in each 
plot is a regression trend line that describes the mean-relationship between the two 
variables. 
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Figure 7. Initial variable relationships to time to promote to E-4 for the 2010 data set 

Note: Time2E4 is on the vertical axis of each plot, and each independent variable is on 
the horizontal axis. 


An initial observation of the relationships between time2E4 and the independent 
variables suggests a presence of possible relationships between variables. For example, 
the downward trend of the red regression line in the IST_CRUNCHES plot indicates that 
as the number of crunches increases, the time to achieve the pay grade of E-4 decreases. 
Similarly, as the IST_RUN time increases, the time to achieve E-4 increases. 


34 






























2. Evaluation of the Regression Model 

We explore the possibility that the response variable, time2E4, may need to be 
transformed in order to better satisfy the assumptions of a linear regression model. Based 
on the application of the Box-Cox procedure as described in Chapter III, the dependent 
variable, time2E4, is transformed by being raised to the power -0.7. 

Prior to estimating the linear regression model, we conduct variable selection in 
order to find a near-optimal model based on the original set of independent variables. We 
begin with 22 possible predictor variables, as listed in Table 5, exluding the ASVAB 
composite scores. Variable selection using Best Subsets Regression with cross-validation 
is performed to find the best subset of the original independent variables as discussed in 
Chapter III. Figure 8 shows the results of fitting the linear regression model for an 
individual Marine’s predicted time2E4. 


Imctormula - Ytime ~ IST.CRUNCHES + IST.RUN + RIFLE_SCORE + WAIV_WEIGHT ♦ 
♦GS + mk ♦ PC, data - MasterlO) 

Residuals: 

Min IQ Median 3Q Max 

-0.0044329 -0.0008939 0.0000736 0.0010470 0.0062517 

Coefficients: 



Estimate 

Std. Error t 

value 

Pr(>|t|) 


(intercept) 

4.348e-03 

1.734e-03 

2. 507 

0.01254 

* 

IST_CRUNCHES 

1.177e-05 

4.180e-06 

2.816 

0.00509 

** 

IST_RUN 

-2.530e-06 

9.829e-07 

-2.574 

0.01040 

* 

RIFLE_SCORE 

1.260e-05 

4.061e-06 

3.103 

0.00205 

** 

WAIV_WEIGHTTRUE 

-8.545e-04 

4.021e-04 

-2.125 

0.03414 

* 

GS 

-4.092e-05 

1.383e-05 

-2.958 

0.00327 

* * 

MK 

4.516e-05 

1.491e-05 

3.030 

0.00260 

** 

PC 

3.780e-05 

1.505e-05 

2. 513 

0.01236 

* 

signif. codes: 

0 '***’ 0.001 ***• 0.01 

*** 0. 

05 *. * 0. 

1 


Residual standard error: 0.001551 on 413 degrees of freedom 
Multiple R-squared: 0.1432, Adjusted R-squared: 0.1286 
F-statistic: 9.859 on 7 and 413 OF, p-value: 2.181e-ll 


Figure 8. All variables with ASVAB subscore model output 
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Based on the results from Figure 8, the variables included in the model are 
statistically significant with p-values of less than 0.05, as seen in the “Pr(>ltl)” column. 

Table 7 shows the descriptive statistics for the six quantitative variables included 
in the model. The descriptive statistics shown are mean, median, standard deviation, 
minimum value, and maximum value. 


Table 7. Descriptive statistics for the quantitative variables used in the 
analysis of time to achieve E-4 using ASVAB subscore and all 

predictors 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

IST_CRUNCHES 

75.41 

73 

19.51 

44 

155 

IST_RUN 

690.90 

690 

83.66 

460 

892 

RIFLE_SCORE 

287.50 

290 

19.42 

250 

329 

GS 

50.58 

50 

6.35 

35 

73 

MK 

53.00 

53 

5.14 

38 

72 

PC 

51.47 

51 

5.85 

37 

69 


We explore the necessity for non-linear transformations of the independent 
variables using partial residual plots (Faraway, 2005). Shown in Figure 9, we use cubic 
basis splines with four interior knots in order to determine if a non-linear transformation 
of the predictor variables would improve the model. It is clear that straight lines can be 
fit within the confidence bands of each of these plots, which suggests that a simple linear 
model formulation should be adequate. We confirm this by conducting an F-test, with 
(42,370) degrees of freedom in order to compare the results from a model with variable 
transformation versus a model without transformation. The resulting F-statistic is 1.5163 
with a p-value of 0.1715. This comparison indicates that the model with transformations 
is not significantly different than the model without transformation at the a - 0.05 test 
level. Therefore, we do not reject the null hypothesis and conclude that non-linear 
transformation of the predictor variables is not necessary. 
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Figure 9. Partial residual plots of the predictor variables used in the analysis of time to 

achieve E-4 using ASVAB subscore and all predictors 

Note: The red line is the cubic regression spline, and the blue lines are 95 percent 
confidence bands. If a straight line fits between the blue confidence bands, a good 
indication of a linear relationship exists. 
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The model diagnostic plots shown in Figure 10 indicate that the model 
assumptions are met and support the findings of the model. 
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Figure 10. All variables with ASVAB subscores model diagnostics, 

after Box-Cox transformation 


The Residuals vs. Fitted plot shows no signs of heteroscedasticity as there are no 
obvious patterns of unequal spread about the horizontal axis, thus indicating that the 
residuals exhibit constant variance. The Normal Q-Q plot indicates that the distribution of 
our data supports normality as the points trend nearly to a straight line. Finally, the 
Residuals vs. Leverage plot indicates that there are no overly influential data points in the 
model. The largest value for Cook’s distance is 0.044, which is well below the commonly 
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used warning value of 0.5. The plots in Figure 10 indicate that the model assumptions 
have been met and provide a valid model. 

3. Explanation of the Model Results 

From the model fit in Figure 8, we find that the most significant predictor 
variables for time2E4 are RIFLE_SCORE, MK, GS, IST_CRUNCHES, IST_RUN, PC, 
and WAIV_WEIGHT. It is important to note that while these variables are statistically 
significant in this model, they would not necessarily be statistically significant or have 
the same level of significance when modelled with a different year of records. The model 
results differ from those of the Computed Tier Score model and show different 
relationships between the entry-level attributes and each dependent variable. This 
provides evidence that the two metrics, or dependent variables, used in our analysis are 
substantially different. 

To evaluate the effect each predictor variable has on the estimated time to achieve 
the pay grade of E-4, we use the median values shown in Table 7 to create a notional 
Marine for comparison. This notional Marine not receiving a weight waiver has an 
estimated time2E4 of approximation 787.2 days, with a 95 percent confidence interval of 
[526.6, 1380.3]. If the notional Marine received a weight waiver before entering service, 
then the estimated time2E4 is approximately 902.2 days, with a 95 percent confidence 
interval of [576.4,1739.3]. 

Tables 8 and 9 show the individual effect on the estimated time2E4 when 
increasing or decreasing the six numerical predictor variables individually by 10 percent; 
as well as varying WAIV_WEIGHT from false to true. Beginning in the second column, 
each column shows the effect on the predicted time2E4 by changing only the heading 
variable while holding all other variables constant. The “Difference” row shows the 
individual impact that each change in the predictor variable has on time2E4. The 
“Accounting for censoring” row shows the predicted time2E4 while accounting for 
censoring in the model, using the method described in Chapter IV for fitting regressions 
to censored data. The variable names have been shortened for presentation of the data. 
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Table 8. Effect of increasing predictor variable values on predicted time to achieve the pay grade of E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

GS 

MK 

PC 

WEIGHT 

CRUNCHES 

73 

80 

73 

73 

73 

73 

73 

73 

RUN 

690 

690 

621 

690 

690 

690 

690 

690 

RIFLE 

290 

290 

290 

319 

290 

290 

290 

290 

GS 

50 

50 

50 

50 

55 

50 

50 

50 

MK 

53 

53 

53 

53 

53 

58 

53 

53 

PC 

51 

51 

51 

51 

51 

51 

56 

51 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

787.2 

777.5 

766.8 

745.5 

812.4 

761.0 

765.1 

902.2 

Difference 

- 

-9.7 

-20.4 

-41.7 

25.2 

-26.2 

-22.1 

115.0 

Accounting for censoring 

791.2 

780.8 

769.8 

749.1 

817.1 

764.6 

767.9 

914.1 


Note: The changes to each predictor variable are indicated by the red numbers, 
while holding all other values of the predictor variables constant. 
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Table 9. Effect of decreasing predictor variable values on predicted time to achieve the pay grade of E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

GS 

MK 

PC 

WEIGHT 

CRUNCHES 

73 

66 

73 

73 

73 

73 

73 

73 

RUN 

690 

690 

759 

690 

690 

690 

690 

690 

RIFLE 

290 

290 

290 

261 

290 

290 

290 

290 

GS 

50 

50 

50 

50 

45 

50 

50 

50 

MK 

53 

53 

53 

53 

53 

48 

53 

53 

PC 

51 

51 

51 

51 

51 

51 

46 

51 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

782.2 

797.2 

808.6 

833.2 

762.4 

815.1 

810.5 

902.2 

Difference 

- 

15.0 

26.4 

51.0 

-19.8 

32.9 

28.3 

120.0 

Accounting for censoring 

791.2 

802.0 

813.7 

837.6 

766.7 

819.5 

815.8 

914.1 


Note: The changes to each predictor variable are indicated by the red numbers, 
while holding all other values of the predictor variables constant. 
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From Table 8, the largest improvement in predicted time2E4 results from an 
increase in Rifle Score, followed by MK, PC, Run Time, and Crunches, respectively. 
Receiving a weight waiver significantly impacts the predicted value in a negative way, by 
increasing the predicted time2E4 by 115 days. Table 9 presents the effects of decreasing 
each of the independent variables by the same magnitudes of change used in Table 8. As 
shown in the model summary presented in Figure 8, the GS ASVAB subscore indicates a 
surprisingly negative relationship with achieving the pay grade of E-4. This may be due 
to the correlation of the GS subscore to the other predictor variables present in the model, 
and warrants further investigation as additional data become available. The last row of 
these two tables gives the results of applying the survreg function in R to account for the 
twenty censored values of time2E4. Not surprisingly, the predicted times to promotion 
are somewhat larger when censoring is taken into account, although the effect is minimal. 

C. ANALYSIS OF THE TIME TO ACHIEVE E-4 USING ALL POSSIBLE 

PREDICTOR VARIABLES INCLUDING ASVAB COMPOSITE SCORES 

1. Evaluation of the Regression Model 

We first explore the possibility of transforming the response variable, time2E4, 
using the application of the Box-Cox procedure outlined in Chapter III. Based on an 
application of this procedure, the dependent variable was transformed by being raised to 
the power -0.7, 

We conduct variable selection in order to find a near-optimal model based on the 
original set of 20 independent variables, as listed in Table 5, exluding the ASVAB 
subscores scores. Best Subsets Regression with cross-validation is used to identify a 
subset of predictor variables for the development of the regression model. Figure 11 
shows the results of fitting the linear regression model using the optimal set of 
independent variables for an indivual Marine’s predicted time to achieve the pay grade of 
E-4. There was no need for non-linear transformation of variables, as the predictive 
power of the model would not be improved. 
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lm(formula ■ 

Ytime ~ IST_CRUNCHES + 1ST 

_RUN + 

RIFLE_SCORE + WAIV_WEIGHT + 

Cl_SCORE, 

data - MasterlO) 




Residuals: 





Min 

IQ Median 

3Q 

Max 


-0.0045221 -0 

.0008783 0.0001143 0.0010144 0.0062726 


Coefficients: 

Estimate std. Error t 

value 

Pr(>|tl) 


(Intercept) 

5.198e-03 1.696e-03 

3.065 

0.00232 

** 

IST_CRUNCHES 

1.262e-05 4.229e-06 

2.983 

0.00302 

** 

IST_RUN 

-2.733e-06 9.956e-07 

-2.745 

0.00632 

** 

RIFLE_SCORE 

1.050e-05 4.058e-06 

2. 588 

0.01000 

** 

WAIV_WEIGHTTRUE -8.106e-04 4.071e-04 

-1.991 

0.04711 

* 

CL_SCORE 

2.029e-05 8.727e-06 

2.325 

0.02056 

* 

Signif. codes 

: o •***• 0.001 ***’ 0.01 

•** 0 

05 *.’ 0. 

1 ‘ ' 1 

Residual standard error: 0.001575 on 415 degrees of freedom 

Multiple R-squared: 0.1123, Adjusted R-squared: 0.1016 

F-statistic: 

10.5 on 5 and 415 OF, p- 

value: 

1.657e-09 


Figure 11. All variables with ASVAB composite scores model output 


The results of model fitting shown in Figure 11 indicate that the variables 
included in the model are statistically significant with p-values of less than 0.05. 

The descriptive statistics for the four quantitative variables included in the model 
are displayed in Table 10, and include mean, median, standard deviation, minimum 
value, and maximum value. 


Table 10. Descriptive statistics for the quantitative variables used in the 

analysis of time to achieve E-4 using ASVAB composite score and all 

predictors 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

IST CRUNCHES 

75.41 

73 

19.51 

44 

155 

IST RUN 

690.90 

690 

83.66 

460 

892 

RIFLE SCORE 

287.50 

290 

19.42 

250 

329 

CL_SCORE 

103.40 

101 

8.87 

87 

137 


The model diagnostic plots given in Figure 12 indicate that the model 
assumptions are met and support the findings of the model. 
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Figure 12. All variables with ASVAB composite scores model diagnostics, 

after Box-Cox transformation 


The diagnostic plots provide evidence that the errors are independent, have 
constant variance, are normally distributed, and contain no overly influential observations 
that could effect the model. 

2. Explanation of the Model Results 

From the fitted model in Figure 11, we have determined that the most significant 
predictor variables are IST_RUN, IST_CRUNCHES, RIFLE_SCORE, CL_SCORE, and 
WAIV_WEIGHT. We evaluate the effect that each predictor variable has on the 
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estimated time to achieve the pay grade of E-4 by using the median values shown in 
Table 10 to create a notional Marine for comparison. This notional Marine without a 
weight waiver has an estimated time2E4 of approximation 794.9 days, with a 95 percent 
confidence interval of [527.6, 1415.3]. If the notional Marine received a weight waiver 
before entering service, then the estimated time2E4 is approximately 905.1 days, with a 
95 percent confidence interval of [574.3 , 1770.882]. 

Tables 11 and 12 show the individual effect on the estimated time2E4 when 
increasing or decreasing the six numerical predictor variables individually by 10 percent; 
as well as varying WAIV_WEIGHT from false to true. The changes to each predictor 
variable are indicated by the red numbers. Beginning in the second column, each column 
shows the effect on the predicted time2E4 (in days) by changing only the heading 
variable while holding all other variables constant. The “Difference” row shows the 
individual impact that each change in the predictor variable has on time2E4. The 
“Accounting for censoring” row shows the predicted time2E4 while accounting for 
censoring in the model, displaying only minimal effect from censoring. The variable 
names have been shortened for presentation of the data. 


Table 11. Effect of increasing predictor variable value on the predicted time 

to achieve the pay grade of E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

CL SCORE 

WEIGHT 

CRUNCHES 

73 

80 

73 

73 

73 

73 

RUN 

690 

690 

621 

690 

690 

690 

RIFLE 

290 

290 

290 

319 

290 

290 

CLJSCORE 

101 

101 

101 

101 

111 

101 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

794.9 

784.2 

772.5 

759.2 

770.8 

905.1 

Difference 

- 

-10.7 

-22.4 

-35.7 

-24.1 

110.2 

Accounting 
for censoring 

798.9 

787.5 

775.4 

763.0 

774.3 

917.0 


Note: The changes to each predictor variable are indicated by the red numbers, while 
holding all other values of the predictor variables constant. 
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Table 12. Effect of decreasing predictor variable values on the predicted time 

to achieve the pay grade of E-4 


Variable 

Notional 

CRUNCHES 

RUN 

RIFLE 

CL SCORE 

WEIGHT 

CRUNCHES 

73 

66 

73 

73 

73 

73 

RUN 

690 

690 

759 

690 

690 

690 

RIFLE 

290 

290 

290 

261 

290 

290 

CLJSCORE 

101 

101 

101 

101 

91 

101 

WEIGHT 

FALSE 

FALSE 

FALSE 

FALSE 

FALSE 

TRUE 

Time2E4 

794.9 

805.7 

818.4 

833.4 

820.2 

905.1 

Difference 

- 

10.8 

23.5 

38.5 

25.3 

110.2 

Accounting 
for censoring 

798.9 

810.6 

823.6 

837.9 

824.9 

917.0 


Note: The changes to each predictor variable are indicated by the red numbers, while 
holding all other values of the predictor variables constant. 


From Table 11, the largest improvement in predicted time2E4 results from an 
increase in Rifle Score, followed by CLJSCORE, Run Time, and Crunches, respectively. 
Receiving a weight waiver significantly impacts the predicted value in a negative way, by 
increasing the predicted time2E4 by 117.9 days. Table 12 presents the effect of 
degrading each of the dependent variable and provides the same ranking relationship of 
the independent variables. 

3. Evaluation and Comparison of the Regression Model Results for the 
ASVAB Subscore Model and the ASVAB Composite Score Model 

This section provides a summary and comparison of the model outputs from the 
two models considered in predicting time2E4; the regression model that uses all possible 
predictor variables including the ASVAB subscores and the regression model that uses all 
possible predictor variables including the ASVAB composite scores. 

Table 13 displays a summary of the model predictions for a notional Marine that 
did not receive a weight waiver. 
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Table 13. Comparison of model results for a notional Marine that did not 

receive a weight waiver 


Model 

Variables included in 

Model 

Predicted time2E4 
w/out Weight 
Waiver 

95% Cl 

ASVAB Subscore 

IST_CRUNCHES, 

IST_RUN, RIELE_SCORE, 
GS, MK, PC, 

W AI V_WEIGHT 

787.2 

[526.6, 1380.3] 

ASVAB Composite 

Score 

IST_CRUNCHES, 

IST_RUN, RIFLE_SCORE, 
CL_SCORE, 

W AI V_WEIGHT 

794.9 

[527.6, 1415.3] 


Table 13 displays similar model outputs. Both models find IST_CRUNCHES, 
IST_RUN, RIFLE_SCORE, and WEIGHT_WAIV to be statistically significant for inclusion. 
The relationship of each of these variables is the same in both models in terms of increasing or 
decreasing the predicted value of the dependent variable. Each model provides similar 
predictions and 95 percent confidence intervals for the predicted time2E4. 

D. EXPLORATION OF COMBINING THE DATA INTO A MULTI-YEAR 
MODEL (FY2008-FY2010) 

This section of the analysis explores the possibility of pooling the data from each 
year into one complete data set of Marines with the 0621 MOS from FY2008 through 
FY2010. Pooling the data into a multi-year study allows us to determine if the entry-level 
attributes are consistently predictive over time. The breakdown of the number of 
observations used by year is shown in Table 14. The total number of observations 
included in the model is 1,126. 


Table 14. Summary of the number of observations used by year 


Fiscal Year 

Number of Observations 

2008 

351 

2009 

354 

2010 

421 

Total 

1,126 
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1 . 


Evaluation of the Regression Model 


We begin with 18 possible predictor variables, as listed in Table 5, exluding the 
ASVAB subscores. Based on an application of the Box-Cox procedure, the dependent 
variable, time2E4, was transformed by being raised to the power -0.3. In order to 
determine if the data from individual fiscal years can be pooled to fit a common model, 
we add the fiscal year as a categorical variable and run the regression model. The results 
of fitting the linear regression model are displayed in Figure 13. 


lm(formula - (time2E4)A(-0. 3) ~ year Age + GENDER + WEIGHT + 

IST_CRUNCHES ♦ IST_RUN + RIFLE_SCORE + WAIV_WEIGHT ♦ CL_SCORE, 
data - Master, subset - tt.noseps) 

Residuals: 

Min IQ Median 3Q Max 

-0.030314 -0.007128 0.001173 0.007072 0.042389 

coefficients: 



Estimate 

Std. Error 

t value 

Pr(>|t|) 


(intercept) 

1. 294e-01 

7.020e-03 

18.426 

< 2e-16 

*** 

year 09 

-5.718e-03 

1.108e-03 

-5.159 

2.94e-07 

*** 

year10 

-7.218e-03 

1.107e-03 

-6. 518 

1.08e-10 


Age 

6.490e-04 

1.870e-04 

3.470 

0.000540 

*** 

GENOERM 

-2.713e-03 

1.374e-03 

-1.975 

0.048554 

* 

WEIGHT 

-3.628e-05 

1.457e-05 

-2.490 

0.012910 

* 

IST.CRUMCHES 

4.975e-05 

2.117e-05 

2. 350 

0.018953 

* 

IST_RUN 

-2.462e-05 

5.229e-06 

-4.708 

2.82e-06 

* * * 

RIFLE_SCORE 

5.167e-05 

1.286e-05 

4.017 

6.30e-05 

*** 

WAIV_WEIGHTTRUE 

-3.741e-03 

1.453e-03 

-2.575 

0.010155 

• 

CL_SCORE 

5.879e-05 

1.717e-05 

3.424 

0.000639 



Signif. codes: 0 '***• 0.001 0.01 **’ 0.05 0.1 ‘ ’ 1 

Residual standard error: 0.0116S on 1115 degrees of freedom 
Multiple R-squared: 0.09637, Adjusted R-squared: 0.08826 
F-statistic: 11.89 on 10 and 1115 DF, p-value: < 2.2e-16 


Figure 13. Multi-year model including year variable 


The model shown in Figure 13 included the individual fiscal years as being 
significant predictors in the regression. This reveals that the year variable provides 
statistically significant information in predicting a Marine’s time to achieve the pay grade 
of E-4. The regression coefficients for year09 and yearlO have significant effects on the 
dependent variable. This result argues against pooling the data from different years to fit 
a common model. 
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The descriptive statistics for the six quantitative variables included in the model 
are displayed in Table 15. 


Table 15. Descriptive statistics for the quantitative variables 

used in multi-year model 


Variable 

Mean 

Median 

Standard 

Deviation 

Minimum 

Maximum 

Age 

20.13 

19.62 

1.87 

17.28 

30.03 

WEIGHT 

163.10 

161.00 

27.64 

96 

259 

IST CRUNCHES 

70.00 

67.00 

18.32 

39 

155 

IST RUN 

706.60 

717.00 

79.62 

450 

918 

RIFLE SCORE 

270.3 

282.0 

20.28 

248 

332 

CL_SCORE 

99.07 

101.00 

21.52 

85 

140 


Further support for not pooling the data from different years to fit a common 
model can be found by inspecting side-by-side boxplots of the residuals from the 
regression broken down by year, as shown in Figure 14. 
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Figure 14. Comparison of regression errors across three years of data 
using boxplots with time2E4 as the outcome variable 
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As seen in the comparison of the boxplots, the variance of the regression errors 
decreases from 2008 to 2009, and then again from 2009 to 2010. The regression errors do 
not exhibit constant variance and violate the basic model assumptions. This decrease in 
the variance of the regression errors possibly indicates that the accuracy of the data 
improves across the years, and could be explained simply by the changing Marine Corps 
policies from year to year for Marine recruitment or changing promotion requirements. 
We conclude that the individual data sets or possibly the relationships are not 
homogeneous across years. Most importantly, this exercise suggests that this analysis 
should be repeated on an annual basis, and not pooled into a multi-year study, at least into 
the near future. 

E. CHAPTER SUMMARY 

This chapter provides a detailed explanation of the four models created in order to 
study the relationships between entry-level attributes of Marine recruits with the 0621 
MOS and two dependent variables; the Computed Tier Score and time2E4. Statistically 
significant relationships between both dependent variables and the entry-level attributes 
are found to exist. 
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V. CONCLUSIONS AND RECOMMENDATIONS 


A. CONCLUSIONS 

This thesis develops multivariate linear regression models to identify the most 
important determinants of a Marine’s advancement to the pay grade of E-4 within the 
0621 Field Radio Operator MOS. Further, we determine that these models have 
statistically significant predictive power for a Marine’s Computed Tier Score at the time 
of eligibility for re-enlistment. We present evidence that these studies should be repeated 
on an annual basis vice pooling the data into multi-year studies. Specifically, four 
questions are considered in our analysis, which are presented in this section with our 
findings. 

1. Do significant relationships exist between entry-level attributes of a USMC 
recruit and the USMC Computed Tier Score or the time for a Marine to 
achieve the pay grade of E-4? 

This study has determined that there are statistically significant relationships 
between the entry-level attributes of a Marine recruit and the USMC Computed Tier 
Score, as well as the time to achieve the pay grade of E-4 within the 0621 MOS in the 
USMC. Entry-level attributes of Marine recruits can be utilized to predict these 
dependent variables. 

2. What are the most influential independent variables that predict the 
Computed Tier Score and the rate of promotion to E-4 in the 0621 MOS? 

The most influential independent predictor variables that allow prediction of the 

Computed Tier Score are found to be IST_RUN, WAIV_WEIGHT, IST_CRUNCHES, 

GT_SCORE, and WEIGHT. The predicted value of Computed Tier Score increases as 

IST_RUN and WEIGHT decrease, or as IST_CRUNCHES and GT_SCORE increase. 

Of particular interest, CL_SCORE and MM_SCORE exhibit a decreasing relationship 

with the Computed Tier Score. The latter does not imply that doing well on these scores 

should be a negative factor in evaluating a Marine, but it does suggest that relationships 

between the predictor variables may lead to a statistical result of this kind. 
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As shown in Table 13 (Chapter IV), IST_CRUNCHES, IST_RUN, 
RIFLE_SCORE, GS, MK, PC, WAIV_WEIGHT, and CL_SCORE are the most 
influential predictor variables used to determine success as defined in terms of the time to 
achieve the pay grade of E-4. RIFLE_SCORE is the most influential predictor variable 
that has a beneficial relationship to time2E4, while receiving a weight waiver prior to 
entering service has the largest negative effect. RUN_TIME, MK, PC, and CL_SCORE 
follow RIFLE_SCORE as providing positive impact on the predicted time2E4, all having 
a similarly influential effect. 

3. What insight does this analysis provide in terms of recommending changes to 
the current entrance criteria for the 0621 Field Radio Operator MOS? 

While IST_CRUNCHES, IST_RUN, RIFLE_SCORE, and WEIGHT provide 

insight into the predicted time2E4, the relationships of time2E4 with GS, MK, PC, and 

CL_SCORE merit further exploration for inclusion in the entrance criteria of a Field 

Radio Operator. Interestingly, EL_SCORE, which is currently used as one of the criteria 

for entry into the 0621 MOS, was not found to have a statistically significant relationship 

to time2E4. This does not indicate that EL_SCORE is not a significant measure of 

suitability to the 0621 MOS, but rather that other ASVAB scores may provide similar 

information in predicting time2E4. 

4. What direction should a future study take to examine ways in which the 
matching of USMC recruits to MOS fields can be improved? 

In order to explore other suitability to MOS measures that could lend to predicting 

a successful match, there is a need for the development of new suitability measures. As 

explained in Chapter II, the Center for Naval Analyses (CNA) developed job 

performance measures for a limited number of MOSs in order to test proficiency in 

performing duties as outlined by the USMC. We recommend that similar job 

performance measures be created across all high-density MOSs in order to support 

studies focused on matching a USMC recmit to his or her MOS. This study can then be 

replicated using a metric that is focused on the quality of matching as the dependent 

variable for analysis. 
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B. RECOMMENDATIONS FOR FUTURE WORK 


Based on the findings in this study, the following future work is suggested to 
expand this field of research and the scope of our findings. 

The models and methodologies utilized in this study should be expanded to other 
high-density MOSs within the USMC. With a better understanding of the influential 
predictors within each MOS, further recommendations can be made to other MOSs 
considered. Further, an optimization of the placement of a selected pool of Marines into 
the MOSs that need to be filled would provide the USMC with a tool to improve the 
quality of matching available Marines to the MOSs. 

The USMC Manpower Database, TFDW, is a vast resource of data that can be 
used to support future studies. Data collection through the USMC database requires an 
extensive level of knowledge of the system and is not user-friendly. The improvement 
and development of a user-friendly and readily accessible database would be a significant 
advantage to those using TFDW for data analysis purposes. More specifically, the 
development of a complete and more detailed data dictionary and user interface would 
improve the availability of data. 

Further exploration and development of new performance and suitability 
measures could provide useful results when analyzing the influence of various predictor 
variables. With the development of standardized performance metrics, this study could 
then be expanded and provide further insight into the job matching problem. 
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